The Natural Stories Corpus

نویسندگان

Richard Futrell

Edward Gibson

Hal Tily

Idan Blank

Anastasia Vishnevetsky

Steve Piantadosi

Evelina Fedorenko

چکیده

It is now a common practice to compare models of human language processing by predicting participant reactions (such as reading times) to corpora consisting of rich naturalistic linguistic materials. However, many of the corpora used in these studies are based on naturalistic text and thus do not contain many of the low-frequency syntactic constructions that are often required to distinguish processing theories. Here we describe a new corpus consisting of English texts edited to contain many low-frequency syntactic constructions while still sounding fluent to native speakers. The corpus is annotated with hand-corrected parse trees and includes self-paced reading time data. Here we give an overview of the content of the corpus and release the data.1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning from Stories: Using Natural Communication to Train Believable Agents

In this work we introduce Quixote, a system that allows non-programmers to train believable virtual agents and robots using the sociocultural knowledge present in stories. Quixote uses a corpus of exemplar stories to engineer a reward function that can be used to train virtual agents to exhibit desired behaviors using reinforcement learning. We show the effectiveness of our system with a case s...

متن کامل

Towards Characterization of Actor Evolution and Interactions in News Corpora

The natural way to model a news corpus is as a directed graph where stories are linked to one another through a variety of relationships. We formalize this notion by viewing each news story as a set of actors, and by viewing links between stories as transformations these actors go through. We propose and model a simple and comprehensive set of transformations: create, merge, split, continue, an...

متن کامل

Automatic Text Generation by Learning from Literary Structures

Most of the work dealing with automatic story production is based on a generic architecture for text generation; however, the resulting stories still lack a style that can be called literary. We believe that in order to generate automatically stories that could be compared with those by human authors, a specific methodology for fiction text generation should be defined. We also believe that it ...

متن کامل

Understanding Mental States in Natural Language

Understanding mental states in narratives is an important aspect of human language comprehension. By “mental states” we refer to beliefs, states of knowledge, points of view, and suppositions, all of which may change over time. In this paper, we propose an approach for automatically extracting and understanding multiple mental states in stories. Our model consists of two parts: (1) a parser tha...

متن کامل

A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories

Representation and learning of commonsense knowledge is one of the foundational problems in the quest to enable deep language understanding. This issue is particularly challenging for understanding casual and correlational relationships between events. While this topic has received a lot of interest in the NLP community, research has been hindered by the lack of a proper evaluation framework. T...

متن کامل

InScript: Narrative texts annotated with script information

This paper presents the InScript corpus (Narrative Texts Instantiating Script structure). InScript is a corpus of 1,000 stories centered around 10 different scenarios. Verbs and noun phrases are annotated with event and participant types, respectively. Additionally, the text is annotated with coreference information. The corpus shows rich lexical variation and will serve as a unique resource fo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1708.05763 شماره

صفحات -

تاریخ انتشار 2017

The Natural Stories Corpus

نویسندگان

چکیده

منابع مشابه

Learning from Stories: Using Natural Communication to Train Believable Agents

Towards Characterization of Actor Evolution and Interactions in News Corpora

Automatic Text Generation by Learning from Literary Structures

Understanding Mental States in Natural Language

A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories

InScript: Narrative texts annotated with script information

عنوان ژورنال:

اشتراک گذاری